This working paper propose to discuss the theoretical problem of regionalisation of a world (in abstract sense) through the empirical example of The World (where we live) described by trade flows over a long period of time and for different types of products.
We will use for that purpose the CHELEM database produced by the CEPII which offers an exceptional coverage of trade flows over a period of 50 years from 1967 to present (2020). The most detailed version of this database describes the exchange between 94 x 94 territorial units (states or group of states) for 72 types of goods over a period of 54 years which means a 4-dimension object (hypercube) of size \(94 \times 93 \times 72 \times 54 = 33988896\) cells.
For our experiment, we will use a reduction of the database based on 12 territorial units described by 9 groups of goods for 5 periods of 10 years each. The hypercube used in our experiment will be therefore limited to a size of \(12 \times 11 \times 9 \times 5 = 5940\) cells. This can appear rather limited but - as we will demonstrate - the complexity of such an object is yet very high and it appears better to establish the theoretical foundation of the research of such an object before to adress larger databases where computational problem will grow exponentially.
Our overarching question can now be formulated in the following way :
Let \(W\) be a world divided in \(1\dots i\dots n\) territorial units.
Let \(F\) a relation defined on \(W \times W\) which assign a value to each couple of units of the world (excluding only internal relations).
Let \(X\) a typology of relations in \(1\dots k\dots p\) types of relation using the same unit of mesurement.
Let \(T\) a partition of time in periods \(1\dots t\dots q\) where the relations are measured
Let \(H = F_{ijkt}\) the hypercube which measure of relation between territorial units \(i\) and \(j\) for the relation \(k\) during time period \(t\)
Problem 1 : What are the partitions \(P_i\) (for origins), \(P_j\) (for destination), \(P_k\) (for typology) and \(P_t\) (for time period) that allows to reduce the size of the initial hypercube \(H\) to a smaller one \(H'\) without losing too much information.
Problem 2 : can we identify homogeneous subparts of \(H\) that are not necessarily based on orthogonal divisions of the hypercube.
Problem 3: can we identify trajectories of regionalisation \(P_i(t)\), \(P_j(j)\) or trajectories of typology \(P_k(t)\) that descibe the evolution of optimal partitions through time .
The original version of the CHELEM database is made of 94 territorial units. A majority of this territorial units correspond to states but some of them are made of aggregates of states for which it was difficult to separate trade flows or to collect them. The map below indicates what are the territorial units that do not fit with international division of the world in states.
The aggregates of states are generally based on groups of small states (like in central America or Oceania) but it can also be the case for larger goups of states playing an important role in trade like in the case of the aggregate between Irag, Iran and Koweit. The aggregation is also very large in the case of subsaharan Africa where only few states are identified and the other mixed in large area, not necessarily contiguous. At the same time, Europe is fully disaggregated in isolated states, except in the case of Malta and Cyprus, which will have for consequence an increase of trade flows in this part of the world. if USA was divided in 51 federal states and China or India in provinces or states, it would necessarily increase their part of exchanges.
We are therefore facing here a difficult question of Modifiable Area Unit Problem (MAUP) which can not be easily solved without deciding immediately to aggregate the data in larger units, more homogeneous, where internal flows will be systematically removed. This will produce of course a strong reduction of the initial information but make possible to have a better analysis of the relation between the new territorial units.
On the basis of expert advices, we have chosen 12 basic territorial units which are in fact associated to a first division of the world in 4 regions, each of them divided in 3 subregions.
The autors of this partition of the world suggest that the world economy has been (at least during a period of time) or could have been (whishfull thinking ?) be organized around three integrated “vertical macroregions” and one residual part of the word less integrated and submit to variable influence of the three vertical regions :
G1 : Europe-Mediterranea-Africa : Clearly inherited from the history, this vertical region is based on various type of proximities including geographical distance, common sea (Mare Nostrum), common language, colonial legacy … But what has been the destiny of these links over the last 50 years following the independancy of states from Africa ?
G2 : Americas : Since the 19th century, “the Monroe Doctrine is a United States foreign policy position that opposes European colonialism in the Western Hemisphere. It holds that any intervention in the political affairs of the Americas by foreign powers is a potentially hostile act against the United States” (Wikipedia). This doctrine has been related to lot of conflict between the different parts of Americas but also associated to the building of various forms of cooperation like NAFTA (1994), MERCOSUR (1991), etc… In any case, the geographical proximity was clearly here in favor of a potential integration. But the reduction of transport cost in the 1980’s has modified the role played by these factor in favor of trans-Pacific relationships. So, what is the situation of America’s integration over our 50 years period of interest ?
G3 : Asia-Pacifica : The economic integration of this part of the world is a long and complex process initially boosted by Japan and Korea, further by China and associated to a continuous process or development of free trade areas like ASEAN. This potential macro-region has been at the same time the pivot of global economic integration of the world, firstly with trans-pacific relation until 1990 and further with the rest of the world with the growing influence of China after this state joined the WTO in 2001. So, is it still a macroregion or the economic core of contemporary world ?
G4 : Rest of the World : We can not speak here from an integrated economic region but rather as a group of states that (1) benefit from ressources of interest forthe rest of the world (e.g. oil and gas from the Gulf, mineral products from Russia, …) and/or (2) develop a strategy of diversification of their exchange at world scale and refuse to be dependent from too powerful partners (e.g. strategy of India, Russia or Saudi Arabia). The question here is to what extent this part of the world remained “neutral” as compare to the three other ones or has been succesfully associated to the different other regions according to variable geometries.
All this remarks are hypothesis that suggest a possible way to cluster the 12 territorial units in 3 or four groups. But our aim is not here to validate the partition \((G_1,G_2,G_3,G_4)\) but rather to use it at starting point for the discovery of alternative geometries changing throug time or presenting variable configurations according to the type of products considered.
The authors of the database CHELEM as made incredible efforts to maintain an homogeneous categorisation of goods in 72 types of producst over a period of 50 years. Considering the changes of the world economy and the evolution of the nomenclature used by trade organization, it is a genuine miracle to have done such a work. We adopt here a simplified version of the CHELEM typology in only 9 groups of products that reflect the distribution of value chains as well as the international division of labor (Grasland and Van Hamme (2010), Grataloup, Boucheron, and Fumey (2014))
On the basis of previous rules we have built the expected hypercube with 5940 cells. The flows has been normalized to an arbitrary total sum of 1000000 for each period of ten years and the values has been round with zero decimal. We have introduced for each couple of region the flows in both direction \(F_{ijkt}\) and \(F_{jikt}\) in order to be able to compute easily the symetric part of exchange called volume and the asymetric part called balance :
| i | j | k | t | Fijkt | Fjikt | Vijkt | Bijkt |
|---|---|---|---|---|---|---|---|
| G11:Europe | G12:Medit.SE | (1) ENE | 1971-80 | 1239 | 16658 | 8948 | -15419 |
| G11:Europe | G12:Medit.SE | (1) ENE | 1981-90 | 904 | 16290 | 8597 | -15386 |
| G11:Europe | G12:Medit.SE | (1) ENE | 1991-00 | 652 | 8143 | 4398 | -7490 |
| G11:Europe | G12:Medit.SE | (1) ENE | 2001-10 | 1371 | 9031 | 5201 | -7660 |
| G11:Europe | G12:Medit.SE | (1) ENE | 2011-20 | 1942 | 5057 | 3499 | -3115 |
| G11:Europe | G12:Medit.SE | (2) MIN | 1971-80 | 3849 | 1489 | 2669 | 2361 |
| G11:Europe | G12:Medit.SE | (2) MIN | 1981-90 | 3196 | 1187 | 2192 | 2009 |
| G11:Europe | G12:Medit.SE | (2) MIN | 1991-00 | 2234 | 875 | 1554 | 1359 |
| G11:Europe | G12:Medit.SE | (2) MIN | 2001-10 | 2188 | 1169 | 1679 | 1019 |
| G11:Europe | G12:Medit.SE | (2) MIN | 2011-20 | 1978 | 936 | 1457 | 1042 |
Before to adress the problem of research of an unknown partition, we will discuss the question of measuring the accuracy of an existing partition, which will help us to precise the problem of the choice of an optimisation criteria.
We will take as example the bilateral trade flows (\(V_ijkt\)) in order to have the same partition for origins and destination (the problem of asymmetry will be discussed later) and consider the total sum of flows in 1991-2000 as starting example. The existing partition will be the division in 4 regions (3 verticales + 1 residual).
| G11 | G12 | G13 | G21 | G22 | G23 | G31 | G32 | G33 | G41 | G42 | G43 | Sum | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| G11 | 0 | 24373 | 11640 | 64562 | 5708 | 11391 | 50525 | 16925 | 4837 | 16958 | 11407 | 5747 | 224073 |
| G12 | 24373 | 0 | 455 | 5332 | 146 | 838 | 3061 | 864 | 220 | 1375 | 1521 | 566 | 38751 |
| G13 | 11640 | 455 | 0 | 4183 | 112 | 677 | 3995 | 893 | 197 | 175 | 561 | 853 | 23742 |
| G21 | 64562 | 5332 | 4183 | 0 | 33752 | 13169 | 73913 | 18473 | 4013 | 1748 | 5233 | 2960 | 227336 |
| G22 | 5708 | 146 | 112 | 33752 | 0 | 2459 | 4733 | 709 | 118 | 857 | 314 | 96 | 49003 |
| G23 | 11391 | 838 | 677 | 13169 | 2459 | 0 | 5105 | 926 | 226 | 372 | 774 | 339 | 36275 |
| G31 | 50525 | 3061 | 3995 | 73913 | 4733 | 5105 | 0 | 37009 | 8435 | 3648 | 13332 | 3444 | 207199 |
| G32 | 16925 | 864 | 893 | 18473 | 709 | 926 | 37009 | 0 | 3282 | 755 | 3671 | 1853 | 85360 |
| G33 | 4837 | 220 | 197 | 4013 | 118 | 226 | 8435 | 3282 | 0 | 46 | 558 | 382 | 22316 |
| G41 | 16958 | 1375 | 175 | 1748 | 857 | 372 | 3648 | 755 | 46 | 0 | 448 | 501 | 26882 |
| G42 | 11407 | 1521 | 561 | 5233 | 314 | 774 | 13332 | 3671 | 558 | 448 | 0 | 2252 | 40070 |
| G43 | 5747 | 566 | 853 | 2960 | 96 | 339 | 3444 | 1853 | 382 | 501 | 2252 | 0 | 18994 |
| Sum | 224073 | 38751 | 23742 | 227336 | 49003 | 36275 | 207199 | 85360 | 22316 | 26882 | 40070 | 18994 | 1000000 |
Assuming that flows are made of 1000000 of discrete events (the total sum of the matrix) we choose as reference (null model) a situation where the export \(O_i\) and import \(D_j\) of each spatial unit is known (margins of the matrix) and where the exchange are randomly distributed. Because of the absence of information on the diagonal of the matrix (trade internal to each region), the model can not be solved by a simple estimation but desserve an iterative double constraint model taking the from
\(F_{ij}^* = a_i.O_i.b_j.D_j+\epsilon_{ij}\)
Analysis of Deviance Table
Model: poisson, link: log
Response: Vij
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 131 2101056
i 11 815299 120 1285757 < 2.2e-16 ***
j 11 1014053 109 271704 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square = 0.871"
This first model account for 87% of the initial deviance of the model which is important but logical considering the inequal size of the territorial units in terms of trade volume.
The analysis of standardized residual make possible to visualize the couple of units where exchanges are higher or lower than expected. A classification of this matrix of residuals make possible to reveal a structure in “blocks” of units that has more internal exchanges than expected.
We notice here that the classification of residuals fit relatively nicely with the expectations of the experts as we can recognize on the diagonal two first groups corresponding to the region Asia-Pacifica \((G_{31},G_{32}, G_{33})\) and the region Americas \((G_{21},G_{22}, G_{23})\). But the next region is limited to only two members of the Rest of the world \((G_{43},G_{43})\) because Russia \((G41)\) seems to be more associated with the region Europe-Mediterranea-Africa \((G_{11},G_{12}, G_{13})\).
We can try to build a first regional model that assume the existence of a simple preference effect with the same value \(\gamma\) for units located inside the same region:
\(F_{ij}^* = a_i.O_i.b_j.D_j.\gamma^{REG}+\epsilon_{ij}\)
Despite the analysis made on the residuals, we decide to keep the partition in 4 regions forecast by the experts.
Analysis of Deviance Table
Model: poisson, link: log
Response: Vij
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 131 2101056
i 11 815299 120 1285757 < 2.2e-16 ***
j 11 1014053 109 271704 < 2.2e-16 ***
REG 1 160356 108 111348 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Total) = 0.947"
Analysis of Deviance Table
Model 1: Vij ~ i + j
Model 2: Vij ~ i + j + REG
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 109 271704
2 108 111348 1 160356 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Gain) = 0.59"
We obtain a model with a pseudo R-square equal to 95 % of deviance explianed (including the effect of the null model) or 59 % of residual deviance of the reference model (excluding therefore what has been yet explained by double constraint on origins and estination). The coefficient \(\gamma\) is very significant and equal to 3.02 which means that exchanges between units located in the same region are in average 3 times greater than exchanges between units located in different regions.
We can adopt a different perspective and imagine that they are as many value of the parameter \(\gamma_{k}\) as they are possibilities of belonging to the same regions. Our model wil therefore take the form
\(F_{ij}^* = a_i.O_i.b_j.D_j.\gamma_{k}^{REG_{k}}+\epsilon_{ij}\)
Analysis of Deviance Table
Model: poisson, link: log
Response: Vij
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 131 2101056
i 11 815299 120 1285757 < 2.2e-16 ***
j 11 1014053 109 271704 < 2.2e-16 ***
REG1 1 66272 108 205433 < 2.2e-16 ***
REG2 1 71893 107 133540 < 2.2e-16 ***
REG3 1 30040 106 103500 < 2.2e-16 ***
REG4 1 467 105 103034 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Total) = 0.951"
Analysis of Deviance Table
Model 1: Vij ~ i + j
Model 2: Vij ~ i + j + REG
Model 3: Vij ~ i + j + REG1 + REG2 + REG3 + REG4
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 109 271704
2 108 111348 1 160356 < 2.2e-16 ***
3 105 103034 3 8314 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Gain) = 0.621"
This model acount know for 95.1 % of the total deviance and 62.1% of the residual deviance of the reference model. It offers a significant improvement of the previous model and reveal that the levels of integration are different in each region. The most integrated regions are Europe_Mediterranea_Africa (\(\gamma_1=3.72\)) and Americas (\(\gamma_2=3.84\)),followed by Asia-Pacifica (\(\gamma_3=2.45\)) and finally the rest of the world (\(\gamma_4=1.36\))
In the previous analysis we have followed the expert advice concerning the division of the world in 4 regions. But we can ask if these choice was really optimal. Looking at the residual of the reference model, we can imagine another partition of the world in four groups where Russia is associated to the region Europe-Mediterranea-Asia. What would be the result ?
Analysis of Deviance Table
Model: poisson, link: log
Response: Vij
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 131 2101056
i 11 815299 120 1285757 < 2.2e-16 ***
j 11 1014053 109 271704 < 2.2e-16 ***
REG1 1 113899 108 157805 < 2.2e-16 ***
REG2 1 64655 107 93150 < 2.2e-16 ***
REG3 1 24385 106 68765 < 2.2e-16 ***
REG4 1 3706 105 65059 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Total) = 0.969"
Analysis of Deviance Table
Model 1: Vij ~ i + j
Model 2: Vij ~ i + j + REG
Model 3: Vij ~ i + j + REG1 + REG2 + REG3 + REG4
Model 4: Vij ~ i + j + REG1 + REG2 + REG3 + REG4
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 109 271704
2 108 111348 1 160356 < 2.2e-16 ***
3 105 103034 3 8314 < 2.2e-16 ***
4 105 65059 0 37975
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] "Mc Fadden Pseudo R-square (Gain) = 0.761"
This model acount now for 96.9 % of the total deviance and 76.1% of the residual deviance of the reference model. It offers a significant improvement of the previous model and modify the levels of integration each region. The integration of the Europe_Mediterranea_Africa extende to Russia is increased (\(\gamma_1=4.74\)) but a small decrease is observed in Americas (\(\gamma_2=3.54\)), in Asia-Pacifica (\(\gamma_3=2.24\)) but we observe a strong decrease of integration in the remaining part of the rest of the world (\(\gamma_4=3.1\)).
We decide know to replicate the model 2 for each of thetime period in order to examine the variations of regional integration.
Analysis of Deviance Table
Model: poisson, link: log
Response: Vijkt
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 5939 12089700
i:t 59 3362205 5880 8727495 < 2.2e-16 ***
t:j 55 4213874 5825 4513621 < 2.2e-16 ***
t:REG1 5 273136 5820 4240485 < 2.2e-16 ***
t:REG2 5 323063 5815 3917422 < 2.2e-16 ***
t:REG3 5 198226 5810 3719196 < 2.2e-16 ***
t:REG4 5 10156 5805 3709039 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| reg | 1971-80 | 1981-90 | 1991-00 | 2001-10 | 2011-20 | |
|---|---|---|---|---|---|---|
| Eur-Med-Afr | REG1 | 2.87 | 3.76 | 3.72 | 2.35 | 2.25 |
| Americas | REG2 | 3.03 | 2.88 | 3.84 | 4.50 | 4.49 |
| Asia-Pacifica | REG3 | 4.29 | 3.11 | 2.45 | 2.76 | 2.82 |
| Rest of the World | REG4 | 0.44 | 0.71 | 1.35 | 1.26 | 1.46 |
_ Commentaire : The introduction of time reveals variations of regional integration through time. For example, the region Eur-Med-Afr has a maximum integration in 1981-1990 and 1990-2000 but lower level before and after. The region Americas, on the contrary has a maximum integration in the final periods of 2001-2010 and 2011-2020. The region Asia-Pacifica was very integrated in 1971-80 and experiment a decrease until 1991-2000 before to increase slowly again.
Here, we replicate the model 2 for the final time period 2011-2020 but we examine separately the level of integration by products.
Analysis of Deviance Table
Model: poisson, link: log
Response: Vijkt
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 1187 2155633
i:k 107 828616 1080 1327018 < 2.2e-16 ***
k:j 99 909782 981 417236 < 2.2e-16 ***
k:REG1 9 38441 972 378794 < 2.2e-16 ***
k:REG2 9 103555 963 275239 < 2.2e-16 ***
k:REG3 9 48475 954 226764 < 2.2e-16 ***
k:REG4 9 7590 945 219174 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| reg | AGR | CHE | ELE | ENE | EQU | MIN | MIS | TEX | TRA | |
|---|---|---|---|---|---|---|---|---|---|---|
| Eur-Med-Afr | REG1 | 2.34 | 1.64 | 5.07 | 3.97 | 2.33 | 2.12 | 0.60 | 3.10 | 2.98 |
| Americas | REG2 | 2.18 | 3.28 | 6.50 | 22.40 | 4.38 | 2.14 | 12.16 | 4.53 | 4.36 |
| Asia-Pacifica | REG3 | 2.62 | 4.13 | 1.12 | 4.73 | 2.81 | 7.16 | 7.45 | 1.41 | 2.16 |
| Rest of the World | REG4 | 1.91 | 1.47 | 1.31 | 0.40 | 0.81 | 1.08 | 3.33 | 1.11 | 1.06 |
The sequence of models indicates that a simple validation of an existing partition does not guarantee that we have found the optimal solution. In our example, we should certainly explore all the possible partition before to validate our final model as the best partition of world trade in 4 regions.
We have also to consider that the decision fo choose 4 regions is not necessarily optimal and we could imagine that more interestin results could be achieved with a partition in 2, 3 or 5 regions. But in this case we have to propose a criterium of optimisation like AIC or BIC which take into account the number of classes used. Finally our results sggest:
In other words, the question of optimal regionalisation is very complex but also very exciting …